Large Margin Training of Acoustic Models for Speech Recognition

نویسندگان

Fei Sha

Lawrence K. Saul

چکیده

LARGE MARGIN TRAINING OF ACOUSTIC MODELS FOR SPEECH RECOGNITION Fei Sha Advisor: Prof. Lawrence K. Saul Automatic speech recognition (ASR) depends critically on building acoustic models for linguistic units. These acoustic models usually take the form of continuous-density hidden Markov models (CD-HMMs), whose parameters are obtained by maximum likelihood estimation. Recently, however, there has been growing interest in discriminative methods for parameter estimation in CD-HMMs. This thesis applies the idea of large margin training to parameter estimation in CDHMMs. The principles of large margin training have been intensively studied, most prominently in support vector machines (SVMs). In SVMs, large margin training presents an attractive conceptual framework because it provides theoretical guarantees that balance model complexity versus generalization. It also presents an attractive computational framework because it casts many learning problems as tractable convex optimizations. This thesis extends and develops large margin methods for estimating the parameters of acoustic models for ASR. As in SVMs, the starting point is to postulate that correct and incorrect classifications are separated by a large margin; model parameters are then optimized to maximize this margin. This thesis presents algorithms for training Gaussian mixture models both as multiway classifiers in their own right and as individual components of larger models (e.g., observation models in CD-HMMs). The new techniques differ from previous discriminative methods for ASR in the goal of margin maximization. Additionally, the new techniques lead to efficient algorithms based on convex optimizations. This thesis evaluates the utility of large margin training on two benchmark problems in acoustic modeling: phonetic classification and recognition on the TIMIT speech database. In both tasks, large margin systems obtain significantly better performance than systems trained by maximum likelihood estimation or competing discriminative frameworks, such as conditional maximum likelihood and minimum classification error. This thesis also examines the utility of subgradient and extragradient methods, both of which were recently proposed for large margin training in domains other than ASR. Comparative experimental results suggest that our learning methods both scale better and yield better performance. The thesis concludes with brief discussions of future research directions, including the application of large margin training techniques to large vocabulary ASR.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Large Margin Training of Continuous Density Hidden Markov Models

Continuous density hidden Markov models (CD-HMMs) are an essential component of modern systems for automatic speech recognition (ASR). These models assign probabilities to the sequences of acoustic feature vectors extracted by signal processing of speech waveforms. In this chapter, we investigate a new framework for parameter estimation in CD-HMMs. Our framework is inspired by recent parallel t...

متن کامل

Large-margin conditional random fields for single-microphone speech separation

Conditional random field (CRF) formulations for singlemicrophone speech separation are improved by large-margin parameter estimation. Speech sources are represented by acoustic state sequences from speaker-dependent acoustic models. The large-margin technique improves the classification accuracy of acoustic states by reducing generalization error in the training phase. Non-linear mappings inspi...

متن کامل

Large-Margin Gaussian Mixture Modeling for Automatic Speech Recognition

Discriminative training for acoustic models has been widely studied to improve the performance of automatic speech recognition systems. To enhance the generalization ability of discriminatively trained models, a large-margin training framework has recently been proposed. This work investigates large-margin training in detail, integrates the training with more flexible classifier structures such...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل